Skip to content

Conversation

@jason810496
Copy link
Member

@jason810496 jason810496 commented Aug 15, 2025

related:

Why

The get_log API with the application/nd-json header should stream logs to the end, but the FileTaskHandler can only read content that has already been flushed to the file and cannot access content that is still being written.

What

There should be a polling mechanism to check for new changes to the file so that the API connection can remain open for streaming content to the frontend.

  • Add stream_file_until_close with poll_interval = 0.1 and idle_timeout = 10.0 seconds to keep streaming the file content until it remains unchanged for longer than the idle_timeout threshold.
  • Add time-based flushing for _interleave_logs so that logs are flushed based on a time interval, even if the heap size doesn't reach HEAP_DUMP_SIZE, to prevent frontend display delays.
  • Remove log_pos from log metadata.
  • Remove LogStreamAccumulator so that the API can raise log records as soon as possible.
    • Since LogStreamAccumulator flushes the log stream to a temp file to get the total log lines for the frontend, it requires waiting until the log stream ends before it can replay the log stream.

Example Dag for Testing the get_log API Streaming Results

with DAG(
    dag_id="test_streaming_log",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    schedule=None,
):

    @task()
    def produce_slowly():
        import time
        print("Large task logs DAG starting")

        for _ in range(800):
            print(uuid4())
            time.sleep(1)
        
        print("Large task logs DAG done")
        return

@github-actions
Copy link

github-actions bot commented Oct 1, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 1, 2025
@jason810496
Copy link
Member Author

up

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:logging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant